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ABSTRACT 

In latent trait aodels the standard procedure for 
handling the problea caused by guessing on aultiple choice tests is 
to estiaate a parameter vhich is intended to aeasure the 
<* guess ingness** inherent in an itea. Birnbana's three parameter model, 
vhich handles guessing in this aanner, ignores individual differences 
in guessing tendency. This paper presents a aodel or procedure which 
uses t^e inforaation contained in the interaction between a person 
and an itea to reaove the effects of randoa guessing froa estiaates 
o£-^bility, difficulty, and discriaination. Sianlated and real data 
are presented vhich support the model in terms of fit and 
inforaation. (Anthor/RC) 
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Removing the Effects of Random Guessing 
From Latent Trait Ability Estimates 

Abstract 

In latent trait models the standard procedure for handling 
the problem caused by guessing on multiple choice tests is to 
estimate a parameter which is intended to measure the 
"guessingness" inherent in an item. Birnbaum's three parameter 
model, which handles guessing in this manner, ignores individual 
differences in guessing tendency. This paper presents a model 
or procedure which uses the information contained in the inter- 
action between a person and an item to remove the effects of 
random guessing from estimates of ability, difficulty, and dis- 
crimination. Simulated and real data are presented which support 
the model in terms of fit and information. 



Removing the Effects of Random Guessing 
From Latent Trait Ability Estimates 
Michael I. .^aller^ 
The University of Chicago 

1. Introduction 

It is well known that individuals vary in their tendency 
to guess randomly on multiple choice tests. With latent trait 
models the standard procedure for handling random guessing on 
multiple choice tests is to estimate a parameter which is in- 
tended to represent the "guessingness" inherent in an item 
(see, e.g., Birnbaum, 1968). Such a three parameter or item- 
guessing model ignores individual variation in guessing tendency. 
Within classical test theory the "correction for guessing" (see, 
e.g., diamond and Evans, 1973) also estimates guessingness , al- 
though in this case the estimate is a function of the number of 
wrong responses made by an individual. 

We argue here, that with models designed to estimate ability, 
there is no need to estimate random guessing behavior and correct 
for it, whether such behavior Is attached to the item or the person. 
In either case our primary interest is in estimaclng ability . The 
models a^'e intended for that purpose, and our interest in guessing 
arises only from an interest in eliminating the "noise" it creates 
in ability estimation. Accordingly, consideration of the problem 
in terms of eliminating the noise rather than estimating guessing 
and correcting for it should be more fruitful, and this is the view 
taken here. Since a large proportion of guesses occur when low 
ability subjects meet items which are too difficjlt for them, 
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Panchapakesan (1969) has suggested omitting low ability subjects 
entirely when estimating the Item parameters* Hovever» these 
subjects can contribute relevant Information concerning easy Items. 
* The procedure presented here represents an Iccprovemeut over her 

idea in cwo Important ways. First, the information contributed 
by every subject is used during calibration of the instrument; 
but is used at only those places where one may be reasonably sure it 
is valid Information. Second* the procedure yields a criterion 
for measuring the adequacy of this method in accounting for random 
guessing. 

In the present paper ve propose a latent t . it model or 
procedure which uses the information contained In the Interaction 
between a person and an item to remove most of the effects of 
random guessing from estimates of ability (and from estimates of 
both item parameters, difficulty and discrimination). This is 
accomplished through a modification of the free response model 
removing those item-person interactions characterized by the item 
being too difficult for the person and therefore likely to Invite 
guessing. The basic assumptions of latent trait models, unidl- 
menslonallty and local independence, are also made here. 

The statistical procedures we derive for the model include: 
1) estimation of the item parameters; 2) estimation of latent 
ability and measurement error; 3) an item-by-ltem test of goodness 
of fit of the model; and 4) an evaluation of the information re- 
covered by a test. The model is equally applicable to the normal 
or logistic response laws. Although the discussion in this paper 
is in terms of binary scored items, the model is immediately 

O 
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generallsable to the nominal category scoring model (Bock» 1972) » 
as veil as the graded response model (Samejlmai 1972). 

« 
ft 
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2. Tha Data 
Suppose that each of N subjects respond to n 
multiple choice items, each item containing 
alternat ives. J"l,*.«,n« The response of the ith subject 
to the jth item may be thought of as right or wrong. 
Omitted items are treated as wrong responses. While 
this treatment of omits is considered a flaw in the 
three parameter model (Lord, 1968, p. 992), we feel 
tiie present model which considers each item--person 
Interaction separately is better able to Justify such 

treatment (see section 3)# 
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3« The Response Mode 
Let 6^ be a value on the continuum of latent 
ability underlying the responses to the test items, 
and let the event that a subject of ability 0^ 
responds correctly to item j be denoted r^^nl* 
Then the free response model Qt jb represented by 
equation (1) « 



(1) Pr (r«lie^)«P^j - ^^^ij^ 5 



where : 



f e"' ^2 tit 



or: F(Y^j)-exp(Y^j)/(l+exp(y^^)) 



in which «b ^ + a^6^ . 

ij J j i 



The quantities and are the item parameters, difficulty 
and discrimination respectively, associated with itera j. 

To obtain estimates of ability removing random guess- 
ing, ur A.K.R.G. estimates, the required adjustment to 
tu*i free response model is represented by equation 2. 



ERIC 



g 



; F < P 



-6- 



vhere g^j « Pr (Person i gu*38scs item j correctly 
given F < P^)^and where P^ is some small probability. 
What use might be made of the set of items where F 
P^ is considered elsewhere (Waller^ 1974). For our 
purpose consider what effect this procedure has on 
estimation of the principal parameters of the model, 
bj, a^ and 

Tiie basic idea is to base the estimate of any 
person*s ability on onlv those items for which 
there is a reasonable chance that the person achieved 
the correct response through the Interaction of his 
ability and the item characteristics. That is^ an item 
which is very difficult for a particular person is an 
item which invites guessing and therefore is eliminated 
from cons iuerat ion in estimating the person's ability 
(^also that person^s response is removed from the sample 
used to calibrate such 'i*^ Item). Whether or not the 
persun guc^seis on sucn an item has no substantial effect 
on tiie estimate of his ability, because these item-person 
interactions are removed from the estimation procedure. 

More specifically, we obtain a preliminary estimate 
ul a subject*s ability from the approximate transforma- 
tion ui iiis per cent currect , inverse normal or logistic. 
This gives iia a rough idea of where the subject belongs 
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on the ability continuum* In each iteration of the 
estimation procedure the probability of a correct 
response^ is estimated for each subject * s response 
to each item. The A.R.R.G. procedure simply omits 
from estimation any interaction for which this estimated 
probability is less than some small probability« the 

cutoff point» P • An adequate method for determining the 

c 

appropriate value of is readily available when 
testing the fit of the model. 

The method presented here treats omits the same 
as wrong responses. In support of this treatment it may 
be argued that there exists a probability^ P^ » which 
can be used to divide all responses into two approxirate 
groupst those responses which are made solely on the basis 
of the subjects* ability^ and those responses which for some 
examinees represent random guessing and therefore as a group 
contribute more noise than Information in an estimation 
procedure* The first group consists of responses which 
occur when the probability of a correct response, ^i j * 
above the point P . It is assumed that subjects in this 
group either know the correct answer or do not, and if not, 
either omit the question or respond incorrectly due 
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to fflisltif ormation. Support for such a model of behavior 
comes from a recent study by Bock (1972) which shows that 
the selection of certain wrong alternatives is representative 
of positive ability. The subject knows enough to choose 
what he believes to be the correct alternative* but not 
enough to make the finer discrimination which would enable him 
to choose what is in fact the correct alternative. If he 
does not omit the item* his parnxal knowledge misinforms him 
and leads him to select an incorrect alternative. 

The second group consists of responses in which F^^ 
is less than ; and across a sample of examinees two behaviors 
are assumed to occur. Kon^^guessers (or low risk takers) continue 
to behave as all subjects behave above they either know 

the correct answer or do not* and if not* either omit or 
respond incorrectly. Guessers (or high risk takers) either 
know the answer or do not; but if not » these subjects will tend 
to guess* in which case» they will be correct 100/Aj% of the 
time» where is the number of alternatives. It is assumed 
that the procedure is robust with respect to minor differences 
in the point at which individuals may begin to guess randomly • 
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4. Estimation 
There are a number of methods for obtaining estimates 
of the parameters of latent trait itodels (see e.g. Birnbaum, 
1968, Bock, 1972, Bock and Lieberman, 1970, Lord, 1968). The 
method described in the present study may be termed cond- 
itional estimation (liock, 19 72) . This method as applied to 
the three parameter item-guessing model is described in 
Kolakowskl and Bock (1970). As the A.R.R.G. procedure 
for removing the effects of random guessing from latent 
trait parameter estimates is a modification of the free 
response model, we first review the estimation procedure 
for the free response model from which estimation removing 
random guessing is easily seen . The estimation procedure 
is outlined in terms of a general response relation 



where F may be any monotonlc function which maps the real 
line into the unit interval, e.g. the noroal ogive or logistic 
ogive . 



Also , when the item-guessing parameter > g^ , in the three 

parar^eter item-guessing model. P.. « g (1-g.) F... is 

J J ij 

treated as a constant during roaximum likelihood estimation, 
the procedure described here is immediately generalized to 
the item-guessing model . 
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Maximum likelihood estiaatlon of ability 

Suppose we have estimates of the item parameters, 
discrimination and dif f iculty ,of n dichotomous Items. 
These estimates might be values based on a previous calibration 
of the testing instrument or estimates from the previous 
cycle during calibration of the Instrument, For the i'^ 
person*s encounter with the j^*^ item, 

let r^j B 1 denote a correct response, 
and r^^ « 0 denote an incorrect response. 
Further, let jr^' « <^ii» ^12^ .... rj^) denote the 
response vector of person i. Thus, under the assumption 
of local independence — that responses of subjects with 
the same ability to different items are statistically 
independent — equation (3), 



is the joint probability function of the response vector 
for person i. To obtain the maximum likelihood (m.l.) 
estimate of 6^ we obtain the first and second partial de 
rivatives of the log of equation (3). These equations 
(omitting the subscripts) are: 



(3) 




f 



(4) 




PQ 



P 



11 
36 



and 
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Considering equation (4) » » we see that the 
equation I ^ » 0 is not easily solved for an explicit 
statement of However ^ the solution is available by 

means of an iterative process. For example, Newton-Rapheson 
iteration allows us to obtain a maximum likelihood estimate 

of 6^. For one variable the estimate of this parameter at 

St k+1 
the (k+1) iteration, 6^ , is given by equation (6)* 



(5) . e" 



with the two partials being evaluated at the previous 
estimate of '^i, 6 . This procedure is repeated until the 
correction, '^^^^qq ^® ^® less than some previously 

specified criterion, say ,001. 

Conditional estimation of item 
parameters by maximum likelihood 

Conditional estimation of the item parameters, that 

is, the calibration of the instrument, also uses previously 

obtained estimates of the parameters not in question, in 

this case abilities. However, the time required to estimate 
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the item parameters can be greatly reduced if the data 
are reassembled in a binomial form, and the principle 
of local independence relaxed somewhat. It is found 
to be expedient to assume that "subjects whose latent 
ability is in the * neighborhood » of 6 re- 
spond independently to different items. The purpose of 
this relaxation is to justify grouping subjects for 
whom provisional estimates of latent scores are similar. 
It is assumed that the actual latent scores of subjects 
in such groups are confined to a sufficiently small 
neighborhood to assure independent responses. The question 
of how small this neighborhood should be to justify the 
local independence assumption is left to further empirical 
study" (Bock, 1972, p. 37). 

Under the relaxed assumption of local independence, 
we can order the subjects by ability and divide them into 
q fractlles. The number of subjects per fractile, N^, 
may be assumed to follow a specific distribution, for 
example N(0,1), or the so called "empirical" assumption 
can be made that there are an equal number of subjects 
in each fractile (Kolakowski and Bock, 1970, p. 5). In 
either case let s » the number of subjects in tha i 
fractile who got item j correct, and let s ' = fs ^> 
. . . , ] so that s^' represents the vector of the 
item responses across the q fractlles. Then under the relaxed 
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assumption of local Independence » equation (7) 



(7) - Pr (s) « n _ p i:) o i i 



•j " rr V8 ; « Jl p *J Q 

1»1 s^^l (N^ ^ s^^)| 



is the joint probability function of the respo.ine vector 
for Item j. As above, P^^ is the response relation and 
is a function of the Item parameters and the 0^ associated 
with the 1^^ fractile^. The value to be used for in 
tiie estimation of the item parameters depends un the assumed 
distribution of abilities: If a normal distribution is 
assumed for the normal deviate corresponding to the 
centroid of each fractlle is used; if an empirical assump-- 
tion is rnade^ the value of d used is the median value in 
the fractlle (Kolakowskl and Bock, 1970, p,5). 

As in the case of ability estimates, we will use as 
our estimates of item parameters, a and b., those values 



The are standardized to a mean of zero and variance 
of one. 
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wUich maximizt i « log Ly In order to obtain joint 
estimateii of these parameters, we obtain all first and 
second partial derivatives of £ with respect to the 
item parameters. These equations (omitting the sub- 
scripts) are: 



where p » /N^ 



a i i PQ 3a 



(10) = Z N 



2„ ? V <.P-P . 3 ^P 



P 0'' > 



(11) i » — ^ «» ? N if£^^ 

^ ^ ba 9b3a ^ ^i )^PQ ^ * 3 b3 a 
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- ( 1^)^ . + (? ' V <Q ■ P.) a 



Again we find that the equations of the first 
partials, « 0 and £^ « 0 are not easily solved in 

closed form and we again rely on a Newton-Rapheson 
procedure. This is accomplished for the case of two 
variables by writing out the first two terms of the 
Taylor expansion of these two equations as in equations 
(13) and solving the resulting system of linear 
equations in the corrections, Ab and Aa . These 

corrections are added to the k stage estimates, 

k k St 

bj and , to form the (k+1) stage estimates. 

(Hildebrand, 1956, pp. 443-51.) Omitting the j 

subscripts we have 



0 » (a*", b*') + l^^ (a^. b^) Ab + l^^ (a*", b^) liT 



(13) 



0 



^ ia^, b^) + i^^ (a^, b^) Ab + l^^ (a^, b*') a7 
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where: = b + 6b and Aa « a + 6a • 

As wlch ability estimates, this process is repeated 

until the corrections, Ab* and aX, are both less than 

some criterion. These equations can be restated more 

4 

compactly in matrix terms. 



(14) 



'b 



^bb Sa 



ba aa 



Ab 



Aa 



or 



H . A 



Lquatlon (14) yields the following corrections: 



^ aa b + ba a 



la « {k. 



ba^b 



t h 

which are added to the k iteration estimates. 

2 



Here , 



& » De t (H) « £ ^1.1. - ^ 1. 

^ ^ aa bb ab 



The Mathematics of the A.R.R.G. 
tima tion Procedure 

Witii the estimation procedure of the free response 

model firmly in hand, the adjustment implied for estimation 



ab ba 



of abilities with the A.R.R.G. procedure is simple 
and straightforward. To understand the implications 
of equation (2) in terms of its effect on the m.l. 
estimation procedure outlined above we need only con- 
sider the first and second partial derivatives of P^^ 
with respect to 0^. Observe, the A.R.R.G. model implic 
the following : 



a P 



p m 



ii 



U 3 0 



3 0 



(15) 



00 



30 



30 



AS expected^ one or the other of these deriva- 
tives are multipliers in the expressions for 1^ and 
^<jO slven in equations (A) and (5 ) , cou^equent ly , 
the response to any item for which F(y ) is less 
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than will not aftect the estimation of the ability 
being considered. 

Ihe A.R.R.G. Procedure: 
Estimation of Item Parameters 

As in the case of ability estimates, the modi- 
fication of the procedure used to obtain estimates of 
the difficulty and discrimination, b^ and a^ , under 
the free response model is most readily seen by con- 
sidering the first and second partials of P^^ with 
respect to these parameters. The form of these 
equations is identical to that given with respect 
to ability in equation (15). In effect we are again 
assuming that those i tem^-sub ject interactions which 
produce provisional estimates of P^^ which are deemed 
unreasonable will not produce relevant Information 
for estimation of the item parameters; and consequently, 
the derivatives associated with such interaction are xero. 

Measurement Error 

The estimate of the asymptotic variance of the 
m.l. estimator^ is obtained from Fisher*s in for- 
mation function which can be stated: 
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30 

where k. indicates expectation. 

It can be shown that asymptotically » 6 has a 
normal distribution with mean 6 and variance 1/1(6); i.e.^ 

Q - N (0, 1/1(0) ) . 

Clearly^ the larger the value of I(@)t the informa'- 
tion» the more precise will be our estimate of ability* 
We will use the information contained in the ability 
estimates resulting from different models to make com- 
parisons of the models. 

While we estimate the precision of every ability 
estimate^ for purposes of general comparison we would 
like to obtain a statement of the information contained 
in any ves t concerning a general level of ability. In 
other words^ subjects of identical ability should respond 
stochastically the same to a given item. At expectation 
the differences between people vanish and we are able to 
obtain a statement for the information contained in a 
test at a general ability level by observing that at 
expectation equation (3), the second partial of i with res-^ 
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pect CO u, is simplified in that terms which con- 
tain r-P vanish (see e.g. Birnbaum, 1968). There- 
fore the statement of the information contained in 
a test at a general level of ability is: 




As has been shown, for each Individual item, a test 

of deviation from the model can be obtained since Q. , 

2 ^ 

equation (16) ,is distributed as a Pearsonian x on 

q - 2 degrees of freedom (Bock and Jones, 1968, pp. 51- 

2 

60). Finally a test of fit for the test as wholei X^g^^ 

obtained by sumning over items the residual sums of 

2 

squares, , and comparing that sum to a x on f * 
n 

1 (q.2)-2 " ~ degrees of freedom. 

J-1 



(16) Q - I -J: iJ iJ 



(17) x^ (O - 2 Q4 . 

Test jol •* 
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The test of fit enables us to Identify the best 
cutoff point to use in applying this procedure. If 
we Ignore random guessing responses when they are in 
fact present in the data, i.e., fit a free responsr. 
model to data contaminated by random guessing, we 
would expect the tet*alting fit to be poorer tha;i the 
fit resulting from a model which adequately accounts 
for random guessing responses. Within the present 
context if we allow all the responses to remain in 
the estimation procedure the fit will be poorer than 
if we omit those responses which may result from 
random guessing: Too many people at lower ability 
levels will get the more difficult items correct. 
On the other hand, if we remove too many responses 
from the estimation procedure, two few subjects will 
appear to be getting the moie difficult items correct 
and we again will observe a poorer fit. Consequently 
by beginning with a cutoff point of » 0 (i.e., a free 
response analysis) und increase the cutoff point we 
should observe an Improved fit up to a point followed 
by a poorer fit. The cutoff point which produces the 
best fit is the proper value for P^. 

Th*: effect on the recovered item parameters of 
such a procedure is straightforward. Underestimating the 
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cutuffg retaining responses at a level where some 

responses are random guesses ^results in underes tiraa t- 

ing the difficulty and discriminating power of the 

effected items* Overestimating the cutoff point 

results in an overes timation of the effected item's dif«- 

5 

ficulty and discrimination. 

Mote that fit occurs with respect to items* 
Given a set of items already calibrated^ the effect 
on ability estimation of removing more items than 
necessary to remove the effects of random guessing 
is simply less information and consequently less precise 
ability estimates. 



5 

In this f ormu • at ion » large negat Ive values of the difficulty 
parameter correspond to difficult Items; consequent ly^ over (under) 
estimation as used here refers to the absolute value of the 
diff iculty parameter • 
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3 « 1 Simulated Data 

In this section we examine a number of analyses 
of two kinds of simulated data: Data sets simulating 
free response behavior or Non-guessing data sets; and 
data sets simulating random guessing of the kind modeled 
by equation (2), i.e , Guessing data sets* A non-guesser's 
response vector is generated by assuming values for his 
ability parameter and for all item parameters^ calculating 
the tru3 probability of a correct response for each item- 
person interaction, and then comparing this probability 
to a random number between zero and one. A guesser's 
response vector is generated in the same manner with the 
exception that fur those item-person interactions In which 
this calculated probability is less than the cutoff point, 
c.i^.y « .05t each subject is assumed to guess in an 
essentially random manner* The same random sequence and 
the bame i^et ol abilities are used for both guessers and 
non-guessers 9 so tliat the response vectors of guessers 
uiiJ nun-guesst; r5 differ only on the subset of items where 
ihti calculatt^d probability of a correct response is less 
tiian the cutoff points With 5 as the number of al ternat ives » 
a gucjsiber will receive a correct response in a random 
manner in 20Z of lUia subset of items, whereas a non-guesser 
will receive a correct response on less than ? ^% of such items. 
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A Non-guessing data set is composed entirely of non- 
guessing subjects; whereas a Guessing data set contains 
approximately twenty-five per cent guessing subjects. 

Two pairs of data sets, each composed of one Non- 
guessing and one Guessing data set^were generated. Both 
sets in a pair utilised the same assumed item parameters* 
The first pair used a more or less idealized set ot item 
parameters with difficulties from -2.2 to 2.2 in steps 
of .1 and constant discriminations. The second pair used 
item parameters obtained from a previously calibrated in- 
^truuicnt with a similar range of difficulties, but with 
widely varying discriminations. 

We present only the characteris tier of analyses of the 
two sets of simulated data with constant discriminations, 
a set of free response or Ncn-gucssing data and a set of 
Guessing data with a true equal to .05.^ When 
variation in d iscrluiineting power is introduced into 
data contaminated by guessing the results presented 
below resulting from the constant discrimination data 
are repJ Icated with one exception. Varying discr iminn i ' ons 
introduce a component of variance into the procedure which 



Ail analyses we'-e performed assuming a normal distribution 
of abilities^ and the Normal Ogive Response Relation* 
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Is not completely accounted for by the estimated 
measurement error* Out of 480 subjects we would expect 
9S% confidence limits to cover all but 24 of the true 
abilities. Forty-five abilities were in fact missed 
by their estimated confidence limits calculated from 

» .05 analysis of the varying discrimination data set. 
Sets of Guessing data were also generated simulat- 
ing individual cutoff points of .10 and .20 with no 
significant changes in the results described in this 
paper; par ticular ly , the best fit always occurred with 

» .05. Since the test of fit is made with respect 
to the item parameters^ increasing the point at which guessing 
begins for any individual does not produce much of an effect 
on this aspect of the estimation procedure: when the 
subjects are grouped for item parameter estimaticnt whe 
proportion of correct guesses In most fractlles remains the 
same. Observe that the proportion of correct guesses is 
the product of the proportion of guessers and the probability 
of a correct response; i.e.^ with 25% of the sample simulating 
guessers and a 5 choice test the proportion of correct guesses 
on any ±t^m which admits guessing is .05. 
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Conscquently , increasing the proportion of guessers in 
the sample will result in an increase in the proportion 
of correct guesses on each item in every fractile 
affected by guecsing; ccncequently such an increase 
should affect the estimation procedure. To demonstrate 
the behavior of the model in this respect, sets of 
Guessing data were analyzed iu which the proportion of 
guessers was set at 50% and in this case the best 
analysis occurred with at the implied level of 
correct guesses , .50 x 1/5 or .10. The implications 
of this for analyses of real data Is that identifica- 
tion of the best cutoff point reflects the proportion 
of guessers in the sample and not the probability at 
which any individual begins to guess. 

Table 1 gives the values of statistics for the 
sample of abilities used to generate both sets of data, 
and the values of these parameters as calculated from 
the recovered sets of abilities f rom the ana lyses 
of the No n- guessing data with constant discriminations 
atilizing different cutoff points* Table 2 presents the 
same values for analyses of the Guessing data« Figures 
(1 to 3) contain plots of the 45 pairs of recovered item 
parameters from the three analyses of thn simulated 
Guessing data (A» Discrimina t ions ^ B » Difficulty). 
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N a 4 78 



Mean 
(S.E.J 

Variance 
(95% C.L. 
Skewness 
(S.E.) 
Kur tosis 
(S.E.) 



) 



TABLE 1 

MOMENTS OF THE TRUE AND RECOVERED SAMPLES OF ABILITIES 
NON-GUESSING DATA (CONSTANT DISCRIMINATIONS) 



True 



-0.008 
0.964 
-0.201 
-0.380 



.00 



(Free \ 
Response j 



- .05 



-0.003 
(0.046 
1.007 
(0.89, 1.15) 
-0.090 
(0.112) 
-0.292 
(0.223) 



0.012 

0.979 
(0.87. 1.12) 
-0.146 

-0.373 



Range Statistics 



Min imum 
Ability 

Maximum 
Ability 



-2.66 

+2 .6J 
5.29 
358 d.f. 



-2.71 

+2 . 70 
5.41 

258.48 



-2.63 

+2.49 
5.12 

311.46 



N 
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TABLE 2 

MOMENTS OF THE TRUE AND RECOVERED SAMPLES OF ABILITIES 
GUESSING DATA (CONSTANT DISCRIMINATIONS) 

Analysis 



True 



Mean 
(S.E.) 
Variance 
(95% C.L.) 
Skewness 
(S.E. ) 
Kurtusis 
(S.E.) 



-.008 
.964 
-.201 
-.380 



Miniuusn 

Ability -2.66 
Maximum 

Ability +2.63 

Range 5.29 

^Test °" ^'^ 



P. - -00 

c 



0.05 



+0.020 

1 .089 
(0.96, 1 
+0. 334° 

+0.431^ 



24 



+0.010 
(0.046) 

0.975 
(0.86, 1.12) 
-0.083 
(0.112) 
-0.315 
(0.223) 



Range Statistics 



-2.77 

+ 3.90 
6.67 

371.25 



-2.60 

+2 .68 
5.28 

271.01 



0.10 



+ .026 

0.971 
(0.75, l.H) 
-0.193 

-0.327 



-2.97 

+2.37 
5.34 

399.89 



.053 



• .UU4 
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In both tables we see that the best fit, indicated by 

2 

the snallest x • occurs at that cutoff point which correctly 
indicates the percent of random guessers in the sample of 
simulated examinees. (Consequently with the A*E.R.G. procedure 
one is able to Identify data which is completely free from 
random guessing as in Table 1 in which case ono proceeds with 
a free response analysis.) The plots of the estimated item 
parameters (Figures 1-3) from the three analyses of the 
Guessing data confirm the effect stated in section A that 
over (under)- est imatlon of the appropriate cutoff point over 
(under )-estlmates (the absolute value of) the item parameters 
c£ those items affected by guessing; i.e., the difficult items. 
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i>.^ Data In Situ 

Ihe first two o£ the three instruments analyzed 
in this paper may be considered subtests of this Survey 
Test of Educational Achievement (STEA), items for which 
were selected from the Sequential Tests of Educational 
Progress (STEP) (Cooperative Testing Service, 1969), 
These subtests, a reading achievement measure and a 
mathematics achievement measure, were each formed for 
tills study by combining two of the five subject matter 
areas which comprise the STEA. The reading subtest is 
composed of the 25 items which make up the Reading and 
Mechanics of writing subject matter areas* and the MATH 
subtest is composed of the twenty-two items which make 
up tae Mathematics computation and Mathematics basic 
concepts subject matter areas. For the purpose of this 
study each subtest is considered a separate test admin- 
istered to a different group of examinees. The fifth 
grade math subtest and the tenth grade reading subtest 
were analyzed. 

The SIEA was given to a very large number of fifth 
grade and tenth grade students, total N " 39,000, through- 
out the southern United States. From each population a 
random sample of size 4000 was selected for analysis 
relating to the project for which the STEA was developed. 
From nach of these samples a further random selection was 
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Bade to reduce the size of each saaple for this study 
to approxinately 500.^ 

The third instruaent is the 30 item Word Knowledge 
subtest of tfetropolitaa Achieve&eat Test. The sample 
used in this study is a random sample of size SOO taken 
from the 17,000 fourth graders who participated in the 

Q 

Compensatory Reading Study* 

For each test values of P from .00 (free response) 

c 

to «20 in steps of .05 were utilised to obtain the best 

fitting cutoff point, and for each test the best fit 

occurred with P « .lO* 

c 

The three instruments were analysed by three models: 
A free response analysis, an A.R.&.G. analysis, » «10, 
and an item-guessing analysis* In a sense the inclusion of 
the free response analysis in this section is superfluous: 
The fit from the A.R.R.G* procedure when applied to free 
response data will Indicate that the data is free from gues8«- 
Ing and that all responses should be used in estimation* 
Therefore, the free response analysis is, when warranted, in- 



The computer program used in performing the latent 
trait analyses is a modified version of MORMOJ, Normal 
Ogive Item Analyser, written for the IBM 360/65 at the 
University of Chicago Computation Center (Kolakowski and 
Bock, 1970), and modified by the author. 

These data are a part of the Compensatory Reading Project 
Contract No* OEC-71-3715. Any conclusions are those of the 
author and are not necessarily endorsed by the U*S« Office 
of Education. 
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ciudtid in the A.R.R.G. procedure. 

Table 3 contdins the fits produced by the different 
analyses . In every set of data the itaproveme&t 
in the fit of the model which accrues from the use of the 
A.K.K.G. procedure is significant. The implications of 
this for latent trait item analysis are far reaching. 



2 

X 


TABLE 
GOODNESS 


3 

OF FIT 












.r ^ No. of 

Items 


DF 


Free 

Response 


DF 


AeReReGe 


DF 


Itea- 
Guesslng 


Word Knowledge 30 
Heading 25 
Mathematics 22 


398 
198 
174 


995.0*^ 
297.0* 
259.3^ 


398 
198 
174 


700.5*^ 

229.7 

217.5* 


353 
173 
152 


975.1*^ 
261.4* 
266.2^ 



< .Ui . 

^.001 
^p ^ <.001 



The attraction of latent trait models results from their 
ability to admit measurement on a scale with a well<-def ined 
metric which in turn results from the probabilistic assumption 
concerning the form of the response relation. Under either 
the free response or item-guessing models^ analyses of the 
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Reading and Mathemacics tests result in rejection of 
this assumption for each instruoent as a whole. In 
this circumstance one recommended procedure for the 
item analyst is to investigate the individual items 
in an attempt to determine which items are failing to 
fit the model. It is suggested that such items be 
either removed from the instrument or returned to the 
item constructor for rewording (Lord and Novick, 1968). 
With each of these measuring instruments, however, the 
A.K.R.G. analysis reveals that either option say be con- 
traindicated; the error in the item-analysis procedure 
lies not with the items, but with the failure of either 
model to adequately remove the effects of random guessing 
from the analysis. For a test in which significant lack 
of fit is found during item analysis, the A.R.R.G. pro- 
cedure results in fewer items being examined and/or eliminated. 
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Parenthecically we note that the itea-guessing 
analysis falls to converge on sosie response vectors corres- 
ponding to very low ability levels. In the two STSA sub- 
tests « twenty subjects in reading and twenty-one subjects in 
math were inestiaable with this analysis, while twelve 
subjects were lost in the analysis of the wcrd knowledge 
instrument. This is an exaaple of the result presented by 
Saaejiaa (1973), indicating that under the itea-guessing 
Bodel, maxiauB likelihood estiaates corresponding to certain 
response vectors aay not be unique or aay not even exist at 
finite values. 

In this regard, in the analyses of all three instruaents 
A.R.R.G. failed to produce an ability estiaate for only one 
subject. This subject received credit for only two iteas out 
of the twenty-two aath iteas, and atteapted every one of 
these iteas. Since two out of twenty-two do not quite differ 
significantly froa the chance percentage of 25 per cent 
(p " .0606), we suggest that this subject guessed at aost of 
these iteas and that aeasureaent of hla by this instrument Is 
Inappropriate. 



With respect to guessing STEA exaalnees are instructed as 
follows! "If a question seeas to be too difficult, make the 
Aost careful guess you can"' and "Wrong answers will not be 
counted against you." 



STEA iteas have four alternatives. 
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6. The Information Structure 

Appropriate comparisons of the information structure 
as estimated by the different analyses of the Guessing data 
provide an insightful basis for evaluation of the different 
models. Consider the loss oi precision or information that 
one might expect to result from random guessistg. If we keep 
item parameters and subject parameters constantt the effect 
on information of random guessing should be to lower the amount 
of information concerning estimation of only those abilitios 

portion of the ability continuum. Generally 
speaking* only lower ability people have the opportunity to 
do much random guessing; clearly the farther up the ability 
continuum* the less the opportunity to guess. The information 
structure recovered from an analysis should reflect this 
situation. We will proceed to examine the information re* 
covered by three analyses of simulated item responses con^ 
taminated by guessing: the free response analysis (F)tthe 
A . R. R.G . analys is (A) and the item-guessing analysis (I) .^^ 

We will use as a standard the information provided by 
the free response analysis of the Non-Guessing (i.e.* free 
response) data with constant discriminations. Figure 4 con- 
tains information curves from such a free response analysis 
of Mon-guessing data(S) and from the three analyses of the 
Guessing data. Recall that except for the simulated random 
guessing, the two sets of data were generated using 



See Lord (1968, p. 1014) for a description of the 
estimation procedure used for the guessing parameter in the 
3«-parameter 1 tem-guesslng model. 
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the same model parameters and Identical sequence o£ random 
numbers • Consequently we may use this curve as an approxi^ 
matlon to the best one might be able to recover in estimated 
information for a test with these true item parameters and 
this distribution of abilities. (In fact, while the S 
information curve was drawn using the estimated item parameters 
the true information curvet which by virtue of our privileged 
knowledge of the true item parameters is also available to 
us» is not distinguishable on this scale from the S curve* 
We note that since ev^^ry information curve presented in this 
article was drawn using the estimated item parameters each 
is in fact an estimated information curve* Of course either 
information curve is drawn with level of ability as the 
abscissa so that estimates of ability do not enter in to the 
determination of these curves.) 

Our first comparison is between the A.R.R.G. analysis of 
Guessing data and our standard* the free response analysis of 
Non-guessing data. The resulting information curves are as 
they should be: Guessing has resulted in poorer precision at 
lower ability levels^ and equal precision elsewhere. 

Our next comparisons concern the Information curve of 
the free response analysis of guessing data. This curve 
apparently reports more information concerning subjects in 
the lower half of the ability range than the A.R.R.G. analysis. 
A comparison to the S curve shows that the free response 
analysis of guessing data reports as much information con* 



The F curve is covered by the S curve in this region 
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cerning lower abilXcy subjects as the free response 
analysis of truly free response data. Substantively we 
know this must be an erroneous conclusion as random guessing 
must result in a smaller amount of information for levels of 
ability where such behavior occurs. Proceeding along the 
ability continuum we observe a significant decrease in the 
amount of recovered information by this analysis for 
higher level subjects, subjects with little or no opportunity 
to reduce information by guessing! The estimated item- 
parameters of a free response analysis of guessing data 
provide an explanation (Figure 1); guessing lowers the 
estimated discriminations of the difficult items, thus 
lowering the amount of estimated information in the upper 
half of the ability continuum. Regarding the lower half, 
guessing doesn't affect estimation of easy items, consequently 
the estimated information from the free response analysis 
erroneously reports as much Information fit these levels as 
that which would be recovered from truly free response data. 
The A.R.R.G. analysis in this range omlt:3 the component of 
information resulting from low-ability-subject, hJ.gh- 
dif f iculty-ltem Interactions, thus more accurately representing 
the amount of information available for CiStimating these 
subject parameters. 
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Our final comparisons concern the information curve 
of an iten-guessing analysis of the simulated guessing 
data (I). There is a clear improvement in recovered 
information at the lower ability levels by the A.R.R.G. 
analysis as contrasted to the item-guessing analysis. The 
difference between the two may, perhaps, be a general 
result of the item-guessing analysis* complete failure to 
take into account individual differences in guessing 
tendency. More interest ing , perhaps , is that apparently the 
item-guessing analysis of simulated guessing data recovers 
more information at the upper ability levels than a free 
response analysis of identically generated--and therefore 
appropriate for this comparison— free response data . The 
item-guessing analysis seems to indicate that the intro- 
duction of guessing into the data at the lower half of 
the ability continuum results in an increase in the infor- 
mation at the upper half. Mathematically the reason for 
this paradox is clear, the item-guessing analysis over- 
estimated the item parameters in the upper portion of ability 
continuum (see Figure 5). Psychometrically » however, it is 
difficult to rationalize at any ability level an increase in 
information as a result of random guessing. 

In this figure we*ve used the simulated guessing data 
generated by the A.R.R.G. model. When comparing the information 
structure from the three analyses of data in situ we observe 
the same general pattern in information curves that we've 



outlined here with simulated data. Of course we doii*t have 
an appropriate set of free response data to make comparisons 
of information curves, but ve are able to obtain curves such 
as Figure 6: Information curves of three analyses, free 
response, A.R.R.G., and item-guessing, of multiple choice 
data, data we assume contains random guessing. 

Figures 7, 8, and 9 contain the information curves 
from the different analyses of the three tests described 
above. A comparison of these figures to Figure 6, reflecting 
an analysis of data known to be generated by the A.R.R.G. 
model, immediately indicates the similarity of the structure 
of all four figures. Each figure is characterised below the 
mean ability by the ordering: free response, A.R.R.G., item- 
guessing. At some point below the mean ability the information 
reported by the free response begins to decline and falls below 
the other two curves for higher abilities. At approximately 
the mean ability the three parameter item-guessing analysis 
begins to report more information than the A.R.R.G. As argued 
above this is a result of an overestlmation of the information 
resulting from overestination of the true item parameters by 
the item-guessing analysis. We feel that these similarities 
between analyses of data known to be generated by the A.R.R.G. 
model, and the parallel figures based on analyses of real 
multiple choice data sets, lend further support for use of the 
A.R.R.G. model in analyses of data contaminated by guessing. 
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We may also compare the different analyses in terms 
of average information. If the underlying ability, 6, is 
distributed normally with mean zero and variance one in the 
population and the estimate of d is scaled accordingly, the" 
average information is 



which is readily evaluated by Gauss-Hermite quadrature. 
For these three tests the average information for each 
analysis is given in Table ^ . 




Table 4 



Average Information 



Test 



Free Response 



A.R.R.G* 



I tern-guessing 



MAT Word Knowledge 



22.68 



25.84 



24.51 



Reading 



6.32 



7.84 



6.63 



Mathemat ics 



7.33 



9.62 



8.47 



Ue see that for each test the A.R.R.G. analysis provides the 
greatest average information for ability estimation. 
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